Syntax-aware Data Augmentation for Neural Machine Translation

نویسندگان

چکیده

Data augmentation is an effective method for the performance enhancement of neural machine translation (NMT) by generating additional bilingual data. In this paper, we propose a novel data strategy translation. Unlike existing methods that simply modify words with same probability across different sentences, introduce sentence-specific approach word selection based on syntactic roles in sentence. Our motivation to consider linguistics-motivated obtain more ingenious language generation rather than relying computation-motivated approaches only. We argue high-quality aligned crucial NMT, and only insufficient provide good enough extra leverages dependency parse trees input sentences determine each sentence using three functions calculate probabilities depths. Besides, our also revises considering length. evaluate multiple tasks. The experimental results demonstrate proposed does effectively boost sentence-independent significant improvement Furthermore, ablation study shows select fewer essential preserves structure.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Syntax-aware Neural Machine Translation Using CCG

Neural machine translation (NMT) models are able to partially learn syntactic information from sequential lexical information. Still, some complex syntactic phenomena such as prepositional phrase attachment are poorly modeled. This work aims to answer two questions: 1) Does explicitly modeling target language syntax help NMT? 2) Is tight integration of words and syntax better than multitask tra...

متن کامل

Graph Convolutional Encoders for Syntax-aware Neural Machine Translation

We present a simple and effective approach to incorporating syntactic structure into neural attention-based encoderdecoder models for machine translation. We rely on graph-convolutional networks (GCNs), a recent class of neural networks developed for modeling graph-structured data. Our GCNs use predicted syntactic dependency trees of source sentences to produce representations of words (i.e. hi...

متن کامل

Algorithms for Syntax-Aware Statistical Machine Translation

All of the non-trivial algorithms that are necessary for building and applying a rudimentary syntax-aware statistical machine translation system are generalized parsers. This paper extends the “translation by parsing” architecture by adding two components that are invariably used by state-of-the-art statistical machine translation systems. First, the paper shows how a generic syntax-aware trans...

متن کامل

Data Augmentation for Low-Resource Neural Machine Translation

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, syn...

متن کامل

Syntax-Directed Attention for Neural Machine Translation

Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2023

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2023.3301214